Fast n-Fold Cross-Validation for Regularized Least-Squares

نویسندگان

  • Tapio Pahikkala
  • Jorma Boberg
  • Tapio Salakoski
چکیده

Kernel-based learning algorithms have recently become the state-of-the-art machine learning methods of which the support vector machines are the most popular ones. Regularized least-squares (RLS), another kernel-based learning algorithm that is also known as the least-squares support vector machine, is shown to have a performance comparable to that of the support vector machines in several machine learning tasks. In small scale problems, RLS have several computational advantages as compared to the support vector machines. Firstly, it is possible to calculate the cross-validation (CV) performance of RLS on the training data without retraining in each CV round. We give a formal proof for this claim. Secondly, we can compute the RLS solution for several different values of the regularization parameter in parallel. Finally, several problems on the same data set can be solved in parallel provided that the same kernel function is used with each problem. We consider a simple implementation of the RLS algorithm for the small scale machine learning problems that takes advantage of all the above properties. The implementation is done via the eigen decomposition of the kernel matrix. The proposed CV method for RLS is a generalization of the fast leave-one-out cross-validation (LOOCV) method for RLS which is widely known in the literature. For some tasks, the LOOCV gives a poor performance estimate for the learning machines, because of the dependencies between the training data points. We demonstrate this by experimentally comparing the performance estimates given by LOOCV and CV in a ranking task of dependency parses generated from biomedical texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Hold-Out for Subset of Regressors

Hold-out and cross-validation are among the most useful methods for model selection and performance assessment of machine learning algorithms. In this paper, we present a computationally efficient algorithm for calculating the hold-out performance for sparse regularized least-squares (RLS) in case the method is already trained with the whole training set. The computational complexity of perform...

متن کامل

Oral Cancer Prediction Using Gene Expression Profiling and Machine Learning

Oral premalignant lesion (OPL) patients have a high risk of developing oral cancer. In this study we investigate using machine learning techniques with gene expression profiling to predict the possibility of oral cancer development in OPL patients. Four classification techniques were used: support vector machine (SVM), Regularized Least Squares (RLS), multi-layer perceptron (MLP) with back prop...

متن کامل

Robust Regularized Singular Value Decomposition with Application to Mortality Data

We develop a robust regularized singular value decomposition (RobRSVD) method for analyzing two-way functional data. The research is motivated by the application of modeling human mortality as a smooth two-way function of age group and year. The RobRSVD is formulated as a penalized loss minimization problem where a robust loss function is used to measure the reconstruction error of a low-rank m...

متن کامل

RLScore: Regularized Least-Squares Learners

RLScore is a Python open source module for kernel based machine learning. The library provides implementations of several regularized least-squares (RLS) type of learners. RLS methods for regression and classification, ranking, greedy feature selection, multi-task and zero-shot learning, and unsupervised classification are included. Matrix algebra based computational short-cuts are used to ensu...

متن کامل

Adjusted regularized estimation in the accelerated failure time model with high dimensional covariates

We consider two regularization approaches, the LASSO and the threshold-gradient-directed regularization, for estimation and variable selection in the accelerated failure time model with multiple covariates based on Stute's weighted least squares method. The Stute estimator uses Kaplan-Meier weights to account for censoring in the least squares criterion. The weighted least squares objective fun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006